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Abstract 

Great cities connect people; failed cities isolate people. Despite the fundamental importance of physical, 
face-to-face social-ties in the functioning of cities, these connectivity networks are not explicitly observed in their 
entirety. Attempts at estimating them often rely on unrealistic over-simplifications such as the assumption of spatial 
homogeneity. Here we propose a mathematical model of human interactions in terms of a local strategy of maximising 
the number of beneficial connections attainable under the constraint of limited individual travelling-time budgets. 
By incorporating census and openly-available online multi-modal transport data, we are able to characterise the 
connectivity of geometrically and topologically complex cities. Beyond providing a candidate measure of greatness, 
this model allows one to quantify and assess the impact of transport developments, population growth, and other 
infrastructure and demographic changes on a city. Supported by validations of GDP and HIV infection rates 
across United States metropolitan areas, we illustrate the effect of changes in local and city-wide connectivities by 
considering the economic impact of two contemporary inter- and intra-city transport developments in the United 
Kingdom: High Speed Rail 2 and London Crossrail. This derivation of the model suggests that the scaling of different 
urban indicators with population size has an explicitly mechanistic origin. 


1 Introduction 

Can the greatness of a city be quantified? The city 
of Nineveh, capital of the Neo-Assyrian empire of 911- 
627 BC, was once described as “an exceedingly great 
city, three days’ journey in breadth” [T]. Today, a 
city described as such would more likely be dismissed 
as an urban sprawl let down by an inefhcient trans¬ 
port infrastructure. Without reference to travelling-time 
constraints, size is clearly not a sufficient measure of 
greatness—^just like rank and title can be poor predic¬ 
tors of influence in social networks nig. Of the many 
candidates 00, the simplest objective measure of suc¬ 
cess is, possibly, the extent to which a city fulfils its pri¬ 
mary purpose of maximising the number of face-to-face, 
opportunity-spawning, interactions between its inhabi¬ 
tants 0. From the rise of the Medici in 15*^-century 
Florence to the prestige of an efficient transport system 
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in a 21®*-century metropolis, this connectivity is synony¬ 
mous with both the eminence of individuals and the suc¬ 
cess of whole cities 01 mini HI]. 

Measuring this connectivity, however, is not straight¬ 
forward. Despite the success of social theory and ex¬ 
periments in much smaller contexts da ns HU, the 
number of face-to-face social ties in a city, unlike sec¬ 
ondary socio-economic indicators, remains poorly esti¬ 
mated. Beneath the reductionist representation of cities 
as featureless groups of individuals lies a forbidding, real- 
world diversity 0 , including widely differing popula¬ 
tion sizes (^10^-10^), distributions (uniform, polycen¬ 
tric m), topologies and geometries, the latter covering 
both geography (boundaries, natural features) as well the 
different modalities of transport infrastructure (rail net¬ 
works, traffic) [16) . In addition, cultural and activity- 
specific behavioural differences (e.g. travelling-time tol¬ 
erances) is a complicating factor in theories of urban hu¬ 
man interactions. 
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A typical strategy is to ignore this heterogeneity in 



favour of simple summary statistics like population size 
HZI, density [IH], or even congestion sensitivity [1] or 
a global fractional dimensionality m- However, com¬ 
paring cities that differ significantly on any of the ex¬ 
cluded characteristics is then simply not possible with 
these models. Of particular significance to city planners, 
such models are, for the same reasons, unsuitable for 
assessing the impact of complex infrastructure or demo¬ 
graphic changes. 

The parsimony of such approaches is, nevertheless, not 
without merit. Most notably, there is an apparent com¬ 
mon scaling with respect to population size across a wide 
range of urban indicators |20j . However, this empirical 
scaling is similar but not identical across indicators, both 
in the scaling exponent /3 and level of statistical sup¬ 
port (e.g. US 2002 new AIDS cases exhibits a power-law 
against population with an exponent j3 = 1.23 and cor¬ 
relation coefficient Adj-i?^ = 0.76 while private R&D 
employment has /3 = 1.34 with Adj-i?^ = 0.92) |17) . 
Furthermore, power-law relationships can also arise by 
chance or as statistical artefacts, and even if supported 
by data they are largely descriptive and do not constitute 
constructive mechanistic narratives USUI]. Indeed, re¬ 
cent attempts (such as in [121 [231 HI] ) to lift this science 
of cities above the level of descriptive statistics reflect a 
growing desire for more generative and explanatory mod¬ 
els. 

A major step in this direction was taken by Pan et al. 
in |18] where the observations behind the super-linear 
scaling relations were shown to be entirely consistent 
with - and actually better modelled by - the more funda¬ 
mental assumption that the probability of social-tie for¬ 
mation between two individuals is inversely proportional 
to the number of people in closer proximity. In spite of 
the arbitrary nature of the probability ansatz, this el¬ 
egant reduction of purely phenomenological power-law 
statistical observations to a statement about the likeli¬ 
hood of interactions between pairs of individuals suggests 
the existence of an underlying set of behavioural princi¬ 
ples governing the formation of the network of social-ties 
in a city. 

In this paper we propose one such set of rules. These 
rules are ‘parameter-free’ in the sense that they do not 
depend on any arbitrary functional assumptions beyond 
several intuitive statements on human behaviour. We 
build from them a model for real-world deliberate (as op¬ 
posed to accidental or serendipitous) social interactions 
derived solely in terms of this set of agent-driven prin¬ 
ciples and is, therefore, by design, truly mechanistic. In 
particular, via our derivation from first principles, we 
show how the probability of social-tie formation origi¬ 
nally proposed in |25j can be viewed as an emergent con¬ 
sequence of these more fundamental and, crucially, mech¬ 
anistic principles. On a practical side, the model readily 
incorporates available detailed demographic, transporta¬ 
tion and economic data, thereby providing a tool for the 
a priori assessment of the effectiveness of planned infras¬ 


tructure measures. 

2 A model of deliberate social ties 

2.1 Modelling principles 

We start with four principles, the justification for and 
mathematical implications of which we will shortly un¬ 
pack: 

1. Individuals are characterized by a set of attributes 
(heterogeneity). 

2. For each attribute, individuals seek out social ties 
only with others who have higher attribute values 
(utility optimisation). 

3. Individuals have a set of attribute-specific travelling¬ 
time budgets Tmax (rcsourcc constraints). 

4. A directed tie is formed only if there are no closer 
and better opportunities in the proximity of the 
seeker (intervening opportunities). 

Heterogeneity The first principle is a nod to the va¬ 
riety of city life. Besides a multitude of attributes— 
from objective (e.g. wealth) to subjective (e.g. beauty), 
from beneficial (e.g. artistic skills) to harmful (e.g. 
criminality)—there exists a spectrum of skills and levels 
in those attributes across the population. To represent 
this heterogeneous set of attributes we define a set of 
non-identically distributed random variables 

{X,V,Z,...}. (I) 

Each set of realisations {a:, y,z,...} then represents an 
individual’s set of abilities and scores in the correspond¬ 
ing attributes. 

Utility optimisation The second principle is a state¬ 
ment of human endeavour, whereby one seeks to build 
beneficial ties. It is simply a variation on the theory 
of rational choice where individuals are deemed to act 
in their own perceived best interest |26j . For a given 
attribute Z, we express this necessary condition for a 
directed social tie from person i to person j as 

(i ^ j)z ^ (2) 

Resource constraints The third principle reflects the 
finite nature of individual resources by adopting the con¬ 
cept of the travelling-time budget Tmax, that is the max¬ 
imum amount of time a person is willing to spend on a 
single commuting trip. There are several explanations 
for the key role it plays in the model. First, instead 
of Euclidean distances between geographical locations, a 
more faithful representation of a city’s geometry is the 
set of real travelling-times along the spatially-embedded, 
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multi-layered, transportation network between individ¬ 
uals (see, for example, m)- Second, there is increas¬ 
ing evidence that the relevant measure for the forma¬ 
tion of social ties is Tmax rather than the spatial separa¬ 
tion between pairs of individuals (see [28] for a critical 
overview). In particular, it has been shown that in cities 
all across the world with high multimodal commuting be¬ 
haviours, there is a uniformity in commute times that is 
independent of travel distance [29] . 

Here, instead of imposing a single, universal Tmax, such 
as was done in m. we allow for a list of different bud¬ 
gets Tj^axi "^maxi ■ • ■ to reflect the heterogeneity of differ¬ 
ing priorities and motivation levels for different activities 
undertaken by a single, fixed, population. For example, a 
city dweller who travels for three hours to attend an im¬ 
portant business meeting might not be willing to spend 
more than 10 minutes on his weekly drive to a supermar¬ 
ket. 

This principle gives us a necessary condition for the 
existence of a tie: 

{i j)z => Tij < (3) 

where is the travelling-time distance between individ¬ 
uals i and j. 

Intervening opportunities The fourth principle rep¬ 
resents the search heuristic that a person employs to 
perform constrained optimisation and is the defining ge¬ 
ometric ingredient of our model. Each potential face- 
to-face interaction implies a minimal path defined by 
the shortest connecting travel route, which, in turn, de¬ 
fines a temporal social-sphere within which one evaluates 
the merit of the candidate interaction against other less 
costly options. These temporal spheres s^, are simply 
the sets of people that are closer to individual i than an¬ 
other individual j, i.e. in a city of population size iVpop, 

S^J :={k\nk< (4) 

with their cardinalities defining the components of the 
rank matrijIH 

Uij := |S'y|. (5) 

Then, we can express a third necessary condition for a 
directed social tie as 

(* j)z ^ (®) 

In studies of human mobility, the consideration of such 
intervening opportunities has been shown to be the key 
to understanding travel patterns between cities [sniisi]. 
This fourth principle of our model is entirely consistent 
with and supports the growing body of evidence linking 
mobility and social contact patterns in cities [24| . 

As will be shown in the next section, these four prin¬ 
ciples, together with an assumption or prior knowledge 

^Note that, in general, riij ^ riji. 


of the spatial distribution of attribute values amongst 
the population, are sufficient to construct a weighted, di¬ 
rected network with the nodes {t, j,... } representing a 
city’s inhabitants and edge weights {Prob(i — 
representing the probabilities of social ties between indi¬ 
viduals. This probability network encapsulates the differ¬ 
ent levels of heterogeneity (attributes, geometry, topol¬ 
ogy, transport modality, and spatial population distribu¬ 
tion) in our model of a city. 

From this probability network one can extract a host 
of statistics relevant to the problem at hand. Below we 
focus on the expected degree, i.e. the expected number 
of social ties of individuals in a city, which we take as a 
first measure of connectivity^ and which turns out to be 
a strong predictor for several urban indicators. 

2.2 Counting social ties 

By design of the model, the three conditions (U),# , and 
§ are together sufficient for the formation of the social 
tie (* —>■ j)z- The probability Prob(z —>■ j)z is, therefore, 
simply the probability that those three conditions are 
satisfied. 

We begin by setting r^ax oo, before reintroducing 
a finite Tmax at a later stage. Then by similar reasoning 
behind the radiation mobility model [30j . we have 

Prob(i —)• j)z =Prob(z^'’4 > z^*^) x 

Prob(z^'^4 > maxz^^^). 

As we show in the Supp. Mat. (see [S1]-[S5]), this equa¬ 
tion can be simplified to give 

Prob(i^j) = —(8) 

i.e. in the absence of travelling time budget constraints, 
the probability of a social tie is entirely determined by 
the rank matrix ([^, and is the same for all attributes 
(hence the dropped Z label). 

This probability expression is, for large n^, virtually 
equivalent to the proposal Prob(z —>■ j) = I/n^ as intro¬ 
duced in [^ and developed in [TH]. Crucially however, 
we have shown that it can in fact be derived directly from 
first principles, and is naturally regularised being well- 
defined when Uij = 0 without the need for artificial and 
arbitrarily imposed constraints on the minimum sizes of 
social-spheres (see [15] ). Remarkably also, the attribute- 
dependency retained at the beginning of our derivation 
drops out naturally from the final expression - our model 
is, therefore, a non-trivial instance of a probabilistic and 
mechanistic social interaction model consistent with ob¬ 
servations of emergent urban-feature independence m- 

Clearly, the key input of the model is, then, the 
travelling-time distance matrix Tij from which one uses 
to build the rank matrix . The data required for con¬ 
structing Tij are often public and readily available online 
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through a variety of tool^ as demonstrated in the ap¬ 
plication examples in Section]^ 

The expected total number of ties Tz corresponding 
to an attribute Z in a population of size iVpop is then 
simply the sum over each individual set of probabilities 
up to a finite Tj^ax) i-®- 

^pop ^ 

< r^ax)- (9) 


Although technically correct, building the distance 
matrix Tij covering the entire population is highly im¬ 
practical for all but the smallest of cities. Instead, 
we subsample the geographical extent of the city at 
Ns (<C Npop) points to generate the much smaller sample 
distance matrix fij. From this coarse-grained represen¬ 
tation of the city, we obtain the approximation 


Tz 


Npop 





— yin nf 
TV, ^ * 


i=l 


+ ( 10 ) 


where nf := K'^ik < T^max) is the size of the social 

sphere, as related to attribute Z, of the location i in the 
subsampled city, and = (1/A^s) (see Supp. 

Mat. for the derivation of this approximation). In the 
following section, we show through a series of simulations 
that this approximation is both unbiased and robust. 

For the remainder of the paper, we drop the Z label 
for notational clarity. 


2.3 Local connectivity 


S3d). In particular it enables one to quantify the dis¬ 
tinct and disproportionate influence that transportation 
and other infrastructure schemes can have in different 


parts of the city, as we show in an example in Section 4.2 
below. 


2.4 Relating social-tie connectivity with 
other measurable indicators 

Our underlying assumption is that there is a link between 
the attribute-specific social-tie connectivity T, as defined 


in ( 10 ), and a measure C/ of a related productive urban 


activity: 


U — fiT) — Uq T o-iT a^T"^ ■ 


( 12 ) 


U can correspond to socio-economic measures such as 
GDP, innovation indices, etc. We are primarily inter¬ 
ested here in scenarios where the contribution of individ¬ 
ual, isolated, efforts is either non-existent (e.g. spreading 
of disease) or negligibly small (e.g. collaborative scien¬ 
tific research output). In such cases, Gq = 0. As a first 
approximation, we consider here a simple proportional 
relation with ai>i = 0, which often provides reasonably 
good explicative power (see [TS] and (32]). For example, 
if the probability p of disease transmission in a single 
encounter between an infected and susceptible individ¬ 
ual is small (e.g. sexual per-act HIV transmission risk is 
< 0.014 [33]), then within a relatively short timeframe 
the total number of new infection cases given T such in¬ 
teractions is simply pT. We, therefore, define our relation 
to be simply 

U = aT, (13) 


The total number of ties T is a global, city-wide, connec¬ 
tivity measure which encapsulates the intricate complex¬ 
ities of the city geometry and heterogeneities in agent at¬ 
tributes. Our model also offers a measure that captures 
the spatial variation in tie-formation across a city. We 
introduce the concept of the local connectivity of some 
sub-region of a city as the sum of all incoming and out¬ 
going ties. Let Ti represent the local connectivity at the 
location of individual i, such that T = Ti. Then 


T^ = ^{Tl 


= - In 
2 


from _|_ 


iVs 


( 11 ) 


a(np + f) + 5 ’ 


where a = Npop/Ng and 7 a scaling factor that ensures, 
for consistency, that = T,^=i Tl° (for a full 

derivation see Supp. Mat.). 

The distribution of Ti reflects the heterogeneity of the 
induced interaction network (see Supp. Mat. Figure 

^e.g. Google Distance Matrix API, MapQuest Route Matrix, 
Microsoft Bing Routes API. 


with a € K the single unknown parameter relating con¬ 
nectivity and its related activity measure. In situations 
where the first-order approximation breaks down, the 
networks of social ties generated through our model al¬ 
low the use of higher statistics beyond the average degree, 
which could be used to test hypotheses against (12). We 
discuss this point further at the end of the paper (see 
also Supp. Mat. where we discuss the expected degree 
distribution). 

In summary, there are just two parameters in the 
model: the constant of proportionality a and, implicit in 
the computation of T, the travelling time budget Tmax- 
We emphasise that these parameters have precise mean¬ 
ings in the model, i.e., they are not just post hoc ad¬ 
justable tuning levers, and that they can be inferred from 
data to characterise the dynamics and the implications of 
human interactions contained in the observations (for an 
example, see Section 3.4). Alternatively, the parameters, 
T’max in particular, can be fixed using prior knowledge, 
such as from travel behaviour surveys, information from 
similar cities, or from crowd-sourced location-data. Fur¬ 
thermore, under the linear assumption, the typical exer¬ 
cise of comparing scenarios (e.g. the relative increase of 
economic activity before and after the completion of a 
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new railway) affords a further simplification, as the pa¬ 
rameter a cancels out when taking ratios. 


3 Validation of the 
model 


social-tie 


The mathematical model above formalises the 
hypothesis-driven narrative stemming from our set 
of agent-driven, behavioural principles, and represents 
a possible mechanistic process of face-to-face commu¬ 
nication within a general population together with its 
city-level phenomenological implications. To check the 
implications of the model, we have performed a set of 
simulations and empirical validations. 

We begin by validating the procedure to obtain T, the 
total number of ties. There are two separate aspects to 
consider: (i) the statistical validity of the sampling ap¬ 


proximation (10) for the population-level T; and (ii) the 
validity of the rank-based formula ([^ for the probability 
of a tie between two individuals given the four principles 
in our model. We examine both parts together in a single 
set of simulations, as described below. 


3.1 Statistical surrogates of cities with 
multi-modality mobility 


To test our model, we generate multiple surrogates of 
cities and the corresponding travelling-time matrices un¬ 
der multi-modal transport networks. These simulated 
cities are designed to model real-world urban mobility 
patterns involving multiple transport modes. We con¬ 
sider four population sizes A^pop = (300,500,800,1200), 
with five different population distributions (a uniform 
distribution over a 45 x 45 km square area, and a 2- 
dimensional, circularly symmetric, Gaussian distribution 
with standard deviations of 3, 6, 9, and 12 km) and two 
travelling time budgets (Tmax = 1)2 hours). 

To simulate the multi-modal transportation infrastruc¬ 
ture we proceed as follows. For each pair of individuals 
i,j in our simulated city, we compute the Euclidean spa¬ 
tial distance Sy and decompose into binary form: 

),«^(4“'.2») + (4j>.2') + (4f .2“)+..., (14) 

where € {0,1}. The multi-modality trans¬ 

port network is represented by a speed vector v = 
{vq,Vi, ... ,Vm), where each component is the speed of a 
certain transportation mode in order of increasing speed, 
Vk+i > Vk- We then generate the travelling-time distance 
matrix Tij between all pairs of points in the city as 


m (fc) 


k=l 


Vk 


(15) 


This framework for the simulation of travelling-times 
replicates two features of modern-day transport infras¬ 
tructure, which is illustrated in Figure First, there 


is the hierarchical nature of travelling speeds with faster 
transport modes covering larger distances. Second, the 
framework allows for the fact that travel between two 
locations in a city typically involves a combination of 
transport modes (e.g. bus -I- train). The slowest mode 
of transportation is given by Vq = 4 km/h. A city with 
no transport infrastructure will be represented by a vec¬ 
tor V = (4,..., 4) and the time between nodes is then 
the time taken to walk the spatial separation distance. 
A more realistic case, where public transportation modes 
of walking, bus and train networks are considered is rep¬ 
resented by V = (4,10,..., 100)). If private travel is 
considered, different classes of roads and expressways tra¬ 
versed using bicycles or automobiles could be considered. 
In our simulations, we considered four different transport 
infrastructures, as shown in Table 

In summary, four population sizes, five distributions, 
two travelling time budgets, and three non-trivial trans¬ 
portation infrastructures give a total of 120 unique sur¬ 
rogate cities, each given by its specified distribution of 
J^pop points on a square 45 x 45 km grid and a resulting 
J^pop X A’pop travelling-time distance matrix t^. 


3.2 Validation of the sampling procedure 
and probability model 


To validate our sampling (10), we compare the travelling¬ 
time distance matrix (15) in our simulated cities obtained 
from the whole population JVpop and from a reduced sam¬ 
ple of Ng = 150 points, as follows. Every one of the 
150 X 149 = 22350 possible directed ties in the sample is 
assigned a probability according to (|^. The total num¬ 
ber of ties in the sample is obtained by summing over 
the probabilities, which are then scaled up according to 
( 10 |. 


In the simulation of the full population JVpop, we take 
the viewpoint of each individual, and we rank the other 
JVpop — 1 people in the population according to their 
travelling-time distances from the individual. We con¬ 
sider a population characterized by an attribute, and the 
individuals are i.i.d. instances drawn from a standard 
log-normal distribution. There are JVpop (JVpop — 1) pos¬ 
sible directed ties. Starting from the closest person, a 
directed tie from the individual is assigned according to 
the fourth modelling principle of intervening opportuni¬ 
ties subject to the upper constraint of an upper bound 
Tmax for the travelling time. 


The results of the comparison between the full popula¬ 
tion and the sample are shown in Figure and the close 
match demonstrates the validity of the probability model 
( ^as well as demonstrating that the sampling procedure 
(10) provides a good and unbiased approximation. 
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3.3 Comparison with power-law scaling 
models 

Using real-world data from US cities, we compare the 
predictive abilities of our model and that of power-law 
scaling models m- We begin by generating travelling¬ 
time distance matrices on sampled representations of 
102 United States Metropolitan Statistical Areas (US 
MS As). The detailed information availabl^on the popu¬ 
lation distributions in these MSAs allows us to construct 
sample distance matrices that are representative of the 
full population-scale distance matrices. We then plot the 
computed number of social ties T (as a function of the 
travelling-time budget Tmax) from our model against two 
measures of urban activity U: the 2011 gross domes¬ 
tic product (GDP) and HIV infection rat^ We also 
make the comparison with the corresponding power-laws 
against population density. As shown in Fig. the 
model is, on its own, well supported by the data with 
a linear logCZ-logT relationship with slope « 1. Our 
social-tie model provides an equally good fit for the GDP 
case (i?^ = 0.92 (social-ties) vs. 0.91 (power-law)) and 
has a signihcantly stronger statistical support compared 
to the power-law fit to population density in the HIV in¬ 
fection rate case {R^ = 0.94 vs. 0.70). Much of this im¬ 
provement stems from the shift from counting people to 
counting ties - specifically ties between HIV-positive and 
negative individuals (see Supp. Mat). It is the overly- 
broad category of a city’s economic output and the lack 
of specificity in the nature of such relationships that ex¬ 
plains the relatively marginal improvement in statistical 
support in the GDP example. Together, the examples 
support the view that the fundamental units of a city 
are not its inhabitants but the social relationships that 
exist between them. 


intervals are given by quantiles from bootstrapped sam¬ 
ples of the original data set (see Supp. Mat.). 

Ignoring for the moment the small range of variation 
in values with r„iax, there are two immediate inter¬ 
pretations. First, our hts indicate that, in contrast to 
economically productive activities, it is unlikely that one 
would be willing to travel for more than 1.5 hours to 
engage in activities associated with HIV transmission. 
Second, as expected, GDP stems from a wide range of 
activities leading to a more variable Tmax- Recognising 
and quantifying such differences in interpretable parame¬ 
ters and their variances, which would be missed by simple 
scaling arguments, is of relevance in efforts to build both 
prosperous and healthy cities. 

Nevertheless, despite the bootstrapped analysis giv¬ 
ing confidence intervals for our Tmax estimates, the small 
range of variation in R^ suggests a level of redundancy 
in our model with the constant of proportionality a in 
(13) affording too much freedom. In order to increase 
the robustness of the model when applied to real data, we 
eliminate the proportionality parameter a by considering 
relative increases of indicators, i.e., we consider the ratio 
U 1 /U 2 of the economic indicators. This is illustrated in 
the next section, where we provide two examples of the 
application of this approach. 


4 Applications of the social-tie 
model 

To illustrate the applicability of our model, we examine 
two examples of large-scale transportation projects in the 
United Kingdom: High Speed Rail 2 (HS2) and London 
Grossrail. 


3.4 Evidence for the attribute- 
dependence of the travelling-time 
budget 

In addition to its predictive performance shown above, 
and because of its agent-driven construction, our model 
can also shed light on the mechanistic origin of social in¬ 
teractions. For instance, the two examples above (GDP 
and HIV infection) highlight a marked difference in the 
underlying social dynamics across the two attributes con¬ 
sidered, as seen from the corresponding maximum like¬ 
lihood estimates of Tmax- We obtain r^ax = 2.43 h 
(95% G.I. [0.36 h, 5.42 h]) for the GDP output versus 
a markedly lower value of r^ax = 0.94 h (95% G.I. 
[0.36h, 1.52hj) for HIV infection rates. The confidence 

^2010 Census of Population and Housing & 2010 U.S. 
Metropolitan Statistical Area Distance Profiles, www.census.gov; 
WWW. microsoft .com / maps / 

Centers for Disease Control and Preven¬ 
tion. HIV Surveillence Report, 2011; vol.23. 

www.cdc.gov/hiv / topics / surveillance / resources / reports /. Feb 

2013. 


4.1 The High Speed Rail 2 project 

HS2 is the proposed high-speed rail network connecting 
the major cities in Britain, from London in the south 
to the northern cities of Leeds, Manchester and beyond. 
In this section we focus on the first phase link between 
London and Birmingham that would reduce the one-way 
travel time from the current 84 to 50 minutes. We treat 
the two cities as a single conurbation and omit the influ¬ 
ence of the neighbouring regions; the results presented 
here should be interpreted in the light of this geographi¬ 
cal treatment. In Figurewe plot the total and percent¬ 
age increases in the number of ties as a function of Tmax- 
If we take the value of Tmax = 2.43 h, which we inferred 
previously for the GDP-related travelling-time budget, 
the average economic boost induced by the presence of 
HS2 across the two cities would be « 0.96%. A more 
robust approach is to consider a range of possible time- 
budgets to evaluate the effect of uncertainty in r^ax (see 
Supp. Mat.). For instance, assuming a uniform distribu¬ 
tion over 1 < Tinax < 3 we obtain an increase in GDP 
of 0.80%. Interestingly, we observe a middle ‘sweet spot’ 
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at TVnax ~ 2 h: at the lower tail, the journey times are 
insufficiently short to tempt one to travel further, while 
at the upper tail, the efforts are wasted on a population 
already willing to endure long commutes. 


4.2 London Crossrail 


Crossrail is a high-frequency railway linking east and 
west London currently under construction. Under the 
same r^ax assumptions as for HS2 above, the projected 
impact of Crossrail on the London economy is a 0.3% 
increase in the city’s GDP (with an increase of 0.61% for 
the uniform distribution of Tmax) (Fig- |^- The percent¬ 
age increases may appear small (< 1%), but it is by no 
means unexpected for two reasons. First, the stated in¬ 
vestment cost is itself a small fraction of London’s GDP. 
Second, the modest boost is simply a reflection of the 
highly concentrated population density in the central re¬ 
gions and the extensive transport infrastructure already 
in place. 

The availability of precise local geographical data al¬ 
lows us to further interrogate the model to determine 
the spatial distribution of local connectivities Ti (11). 


Indeed, it is important to note that neither the current 
local connectivity levels nor the impact of Crossrail are 
evenly distributed or felt across the city (see Fig. |^. As 
would be expected, the largest increases are found near 
railway stations, especially in London’s suburbs. As we 
explore further in Supp. Mat. (see Fig. S5), there is 
a concentration of newly possible connections along the 
east-west extent of the city. More surprisingly, however, 
we observe a decrease across large areas along the or¬ 
thogonal north-south axis driven by falls in their relative 
accessibility—the rising tide of connectivity does not lift 
all boats. This effect may be unavoidable, but the abil¬ 
ity to quantify and map its spatial extent allows one to 
anticipate and, possibly, alleviate its impact. 

There is a mooted north-south extension - Crossrail 
2 - which is currently under study (see Supp. Mat. for 
details). In similar fashion to Crossrail, the expected 
additional boost to GDP can be calculated and is shown 
in Figure Grucially, in line with one’s intuition, the 
negative local impact is now distributed outside the areas 
surrounding the Grossrail 2 rail line. 


5 Discussion 

Unlike typical social network and epidemiological studies 
that assume a fixed and known network structure within 
which various dynamical processes (e.g. spread of dis¬ 
eases) are constrained, our approach obtains interaction 
networks as induced structures that emerge from the ap¬ 
plication of our set of principles to different cities. In this 
sense, these interaction networks are unobserved struc¬ 
tures, much like genealogical trees in population genetics 
|34j . Unlike random geometric graphs emerging in mod¬ 


els of cities with uniform population distributions |35j . 
our model incorporates agent-driven optimisation prin¬ 
ciples and physical constraints from the geometry and 
topology of each city. Hence, rather than functioning as 
input features for our model, these resulting networks 
capture and are confined by the make-up of the demo¬ 
graphic and transport infrastructure data under study. 

Although the unobservable nature of the underlying 
connectivity networks poses challenges for the direct val¬ 
idation of our model, the recent availability of large- 
scale location data from mobile phones appears to offer 
a wealth of possibilities for testing some of the model as¬ 
sumptions, e.g., the existence of travelling-time budgets 
■^maxi their assumed uniformity across the population 
for each attribute. However, there are specific conditions 
that such empirical studies must fulfil. In particular, 
one should be able to identify, with reasonable certainty, 
the purpose and deliberateness of both single journeys 
and social ties observed. In this context, the growth of 
location-based and, crucially, activity-specific, social net¬ 
working services could provide valuable information |36j , 
in contrast to simply relying on proximity information 
for social tie prediction [37]. 

As shown above, the overall connectivity T is, on its 
own, a strong predictor for several urban indicators and 
we have concentrated on this aspect in this paper. This 
is reassuring given the known ability of mean-field the¬ 
ory to capture basic trends [35] on networks. Neverthe¬ 
less, further details and statistics (e.g. heterogeneity) of 
the obtained networks could be studied, as the mecha¬ 
nistic and constructive nature of our model provides the 
necessary information for extracting these additional fea¬ 
tures. We provide a short illustration of this process in 
the Supp. Mat.. An extension of our model will be to 
propose and test the analogue of (131 with different net¬ 
work statistical measures in place of T. 

The generic nature of the proposed framework and the 
increasing availability of geo-location and travel data en¬ 
sure a broad and growing array of applications. This 
includes gauging the robustness of a city to traffic conges¬ 
tions and measuring the cost of weather-related disrup¬ 
tions. Methodological extensions to the model might in¬ 
clude, for instance, replacing travel time with a cost func¬ 
tion incorporating spatial distance, financial cost and the 
time-of-day. 

Our focus for most of this paper has been on the city as 
defined by civil administrative conventions. Since stud¬ 
ies of cities are sensitive to the exact definition of a city 
itself [31110], there is the option of adopting one of the 
more nuanced alternative definitions that do not include 
any arbitrary geographical boundaries |41| . However, the 
model itself is actually agnostic as to the source of the 
population variables A^pop or the travelling-time distance 
matrices Tij , as indeed we have shown by treating the two 
cities of London and Birmingham as a single entity in our 
analysis above. Our approach can thus be applied to re¬ 
flect the connectivity among geographic entities both on 
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a larger scale (countries or larger geographical regions) 
and a smaller scale (buildings or campuses). On such 
smaller scales, this approach can inform design to max¬ 
imise the creative, social and economic benefits resulting 
from human encounters. Regardless of the context of ap¬ 
plication, it is not the actual spatial size but the extent 
perceived via travelling times that determines the con¬ 
nectivity of a system. Large cities may be great, but 
great cities most certainly look small. 
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Figure 1: Multilevel mobility network decomposition of urban interaction networks. In the multilayer 
mobility networks, the red and green nodes represent the origin and destination, respectively, of the particular directed 
edge in the city interaction network. The blue crosses indicate a transfer from one transport mode to another (e.g. 
walking to metro), where each cross on a given layer corresponds to another on a different layer. Note that the spatial 
position of each transfer node in each layer have no meaning other than to provide an indication of the spatial distance 
travelled in the corresponding mode. 





log {no. of ties) 

(from theoretical sampling formula) 




log (population density) 



log (total tie-density) 



Figure 2: Validation of sampling procedure and empirical validation with HIV infection rates and GDP of 
102 US Metropolitan Statistical Areas, a. Comparison of the total number of ties empirically counted according 
to the interaction model (y-axis), with the number of ties estimated from population samples of 120 simulated cities, 
according to (10) and ^ (x-axis). The four colours (red, blue, green purple) indicate population sizes of 300, 500, 
800, and 1200 respectively. Further variation in the cities are created by imposing different population distributions, 
maximum travelling-time budgets, and transport infrastructure. The circles indicate the mean of 30 simulations and 
the vertical lines ±2 standard deviations. As shown, the sampling procedure provides a reasonably good estimate of 
the total number of ties, b, e, Power-law fits of urban indicators to population density, c, f. Linear fits of urban 
indicators to tie-density with Tmax set at the maximum likelihood values (as indicated by the blue circles in d, g). e, 
g. Coefficient of determination of tie-density fits as a function of maximum travelling-time budget The error 

values on the slope parameters indicate ±2 standard deviations. We note that for both urban indicators, the fits to 
total tie-density outperforms the fits to population density. 
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Table 1: Travel speeds of four increasingly developed transport infrastructures, represents the trivial case (i.e, 
no infrastructure). The units are kilometres per hour. 
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Figure 3: High Speed Rail 2 (Phase 1) and its impact on the connectivity of UK cities, a, High Speed 
Rail (Phase 1) route and the population densities of London and Birmingham. The blue line indicates the published 
proposed route of the first phase of HS2 (as of Dec-2013). The red diamonds indicate the locations of the rail stations 
in each city. The contour maps are derived from kernel density estimates of 1000 and 129 samples points in London 
and Birmingham respectively. The ratio of the number of samples is chosen to reflect the relative sizes of the two 
cities. b,c, Impact of High Speed Rail 2 (Phase 1) on the connectivity of UK cities. The black curve indicates the 
connectivity without HS2. The red curves indicate the connectivity according to the planned improved travel times 
(50 mins between London and Birmingham). The grey curves in c indicate hypothetical travel times of 30, 40 and 60 
minutes. 
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Figure 4: Impact of London Crossrail on city-wide and local connectivities. a,b, Impact of London Crossrail 
on the connectivity of London. The black curve indicate the present connectivity without Crossrail. The red curves 
indicate the connectivities according to the planned improved travel times from Crossrail (but without Crossrail 2). 
The blue curve in b shows the connectivity boost by including Crossrail 2 (metro-only option), a proposed project 
extension to include a North-South train link, c, Percentage change in local connectivity due to Crossrail, d, Percentage 
change in local connectivity due to London Crossrail 2 (metro-only option) relative to post-Crossrail. The heat map 
scales indicate percentage change in the total number of incoming and outgoing ties for each region. The red points 
indicate the Crossrail stations and the blue points the 12 proposed Crossrail 2 stations. 
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SI Derivation of social-tie formu¬ 
lae 

Sl.l Rank-based tie probability 

Let be a set of positive real-valued random 

variables representing a single attribute for individuals in 
a population of size iVpop. We assume that the random 
variables are independent and identically distributed 
(i.i.d.) according to some distribution q{z\z G M+). 
Let Tij be the distance matrix specifying the travelling¬ 
time distances between individuals, and Sij the temporal 
social-spheres given by the sets 

Sij := {k\Tik (S16) 


Substituting (S20) into (S19) and changing variables 
gives 


Prob(i -)■ j) = / [?(< - 9(< 2)”'^“'"^] dg(< z) 

Jo 

_ 1 

{riij + 2 ) 

(S21) 


This is Eq. (8) in the main text. Remarkably, this is 
independent of the specific attribute Z under consider¬ 
ation. For large rank riij and up to a constant of pro¬ 
portionality, this expression for the tie-formation prob¬ 
ability closely resembles the original rank-based ansatz 
Prob(i —>■ j) = l/riij in [THldS]. The difference here is 
that the attribute-independence of the tie-probability is 
an emergent feature rather than a theoretically unsup¬ 
ported assumption of universality. 


The number of people that are closer to individual i than 
j as determined from Tij is represented by the rank ma¬ 
trix riij, which is simply the cardinality of the temporal 
spheres, i.e. 

Uij := |S'y|. (S17) 


SI.2 Social-tie sampling approximation 

Following [18j we first assume a uniform population den¬ 
sity p and travelling-time budget Tmax- The density is 
measured in terms of number of individuals per travel¬ 
ling time ‘volume’. The number of ties ti{p) to a given 
individual i is 


By design of the proposed social interaction model, in 
the case where Tmax —>■ oo, a directed tie from i to j is 
formed if and only if, 


u{p) = 



2 'KTp dr 
-|- 2 


dr 


= ln(p7rr^ax + 2 )-ln 2 


(S22) 


yU) 


> z 


(i) 


and 


z^^^ > max 

fee Si,- 


(S18) 


where is a realisation of and Sij. Let z„ be the 
maximum value obtained from m random samples from 
g(z), and let P„(c) be the probability that z„ satisfies 
some condition c. Then, from (S18), the probability of a 


directed tie from individual i to j is 


Prob(ij) = Prob(z(^) > zW) X / Pi(> z)P„.^ (= z) dz 

Jo 

Pi(> z)P„..+i(= z) dz. 

(S19) 



Since the population individuals are assumed to be i.i.d. 
w.r.t. q{z) we have 


Pi(> z) = 1 - q{< z). 


(S20a) 


p„,,+i(= z) = --= (n^ -k 1) [q{< z)] 


(S20b) 


= In 



-f 


with Si = pTTT^^y. the size of the temporal social-sphere, 
i.e. number of nodes reachable from node i. Here we are 
first evaluating the probability of an individual at the ori¬ 
gin finding another individual with higher attribute value 
in a differential spherical volume with the radius is given 
by the minimum travelling-time distance on the underly¬ 
ing network. Expanding the radius of action on the net¬ 
work, we can then geometrically determine the expected 
number of ties by integrating up to an attribute-specihc 
limit. 

We now drop the dependence on the constant uniform 
density p where the allowance for a heterogeneous distri¬ 
bution is reflected in a varying St for different nodes i. 
We replace ti{p) with ti to indicate this transition. The 
total number of ties T in the population is, then, simply 


N„ 


T = ^ = iV, 


pop 


N„ 


N, 




pop 


< iVpop In 


N 

iVpop X ^ 

1 + 1 


-^pop ^ V 2 


(S23) 


— .^pop In 77 T 1 I ) 
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where .5=1 /-/Vpop is the population average of 

the number of reachable nodes. The inequality is due to 
Jensen’s inequality and the concavity of the logarithmic 
function. 

However obtaining the full set is neither pos¬ 

sible or practical for typically-sized cities. We therefore 
take a sample of Ng points. Defining a = fVpop/A^s, if 
the sample is representative of the population, we have 


perform the approximation in three stages on the basis 
of several reasonable assumptions. First, for two individ¬ 
uals i and j in our population sample, we approximate 
the true population rank from the sample rank hy, 

i.e. 


\ji = {a-l) + -{a-l)+anji = a rij^ + 


3 

2 ’ 


(S29) 


S 

- =a, 

n 


(S24) 


where n is the average of the number of reachable nodes 
within the sample set, i.e. h = [l/Ng) We have 


-^Vpop ^ “ -^p°p ^ 


— ^pop In 


an / 2 


= ^pop 


iVn 


> iVpop 


, a , - , /. 2 

In — -I- In n -I- In 1 H -^ 

2 \ an 


, a _ 2 

in — -|- In n -I-r 

2 an 


Ns 


a 1 \ / 2 

2 Ng^ \an 

1—1 ^ 

(S25) 


where the three terms in the sum are, respectively, the 
scaled contributions from individuals i, j and the fiji 
intervening sample^ Second, from (S21) and (S29), the 
probability of an directed tie from j to i is 


Prob(j —>■ z) = 


a{nji + |) + 5 


(S30) 


A first approximation of the expected total incoming ties 
is the appropriately-scaled sum of all incoming tie prob¬ 
abilities from our sample, i.e. 


Ns 


rpt 


= a ^ Prob(j —)■ i). 


(S31) 




However, imposing the consistency criteria = 


requires a third step of scaling (S30) appropri¬ 
ately. Therefore, we have 


Combining (S23l and (S25l we expect the two inequali¬ 


ties to cancel out approximately, giving 

Ns 


N, 


pop 


In 


N, 


pop 


2Ng 




N. 


+ —. (S26) 
n 


This is Eq. (10) in the main text. 


SI.3 Local connectivity 

The local connectivity is defined as half the sum of in¬ 
coming and outgoing ties from a given location. Let 
represent the local connectivity of the location of indi¬ 
vidual i. By definition, we have 

Npap 

Ti = ^ -b T*°), with J2Ti = T. (S27) 

i=l 


As in the case of global connectivity, the key is to ap¬ 
proximate Ti without having access to the population 
distance matrix. We estimate the outgoing and incom¬ 
ing contribution separately, beginning with the outgoing 
component 


Following the reasoning behind (S25), we have 


T- 


'from 


= - In 


+ 1 


(S28) 


Quantifying the incoming ties is less straightforward as 
there is no simple scaling from the sample. Instead we 


Ns 


Tf° = 7a^Prob(j ^ i), 




with 


(S32) 


7 = 


E Ns r^from 
i=l 


Prob(j ^ i) 




(S33) 


Substtituing (S28) and 
(11) in the main text. 


into (S27), we obtain Eq. 


S2 Induced network structure 
from social dynamics 

In this paper we have constructed a probability model 
for generating social-tie networks where the edges denote 
deliberate (i.e. planned as opposed to random encoun¬ 
ters) face-to-face interactions. It is worth reemphasising 
that the networks throughout are themselves unobserved 
structures, which compels one to average over all possi¬ 
ble networks. In this section we provide three instances 
of how our model can be coaxed to provide additional 
secondary expected network summary statistics. This is 
in addition to the expected number of interactions, i.e. 

®NB: riij ^ riji. We assume throughout that individual i is the 
seeker, i.e. the recipient of incoming ties. 
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the expected number of network edges, which we exam¬ 
ined in the section above, and have showed in the main 
text to be a sufficiently strong predictor for several urban 
indicators. Specifically, we look at network heterogene¬ 
ity, multilevel network structures, and spatial extent of 
spatial networks. 

S2.1 Network heterogeneity 

We focus on the impact that different population dis¬ 
tributions and travelling-time budgets have on the ex¬ 
pected network degree distributions. We simulate three 
cities following the procedure outlined in the section 
above. The first two cities are networks with 150 nodes 
uniformly distributed with average Tmax = 0.35 and 
Tmax = 0.35 respectively, while the third is a network 
with 150 nodes sampled from a (1/3, 2/3)-weighted mix¬ 
ture of a uniform and Gaussian distribution with compo¬ 
nent standard deviation of 4 km and average T„iax = 0.5. 
The travelling-time budgets were chosen such that the 
second and third networks possess a similar number of 
edges. In both cities the transport infrastructure is as¬ 
sumed to have three modes and is represented by the 
speed vector 


The results are presented in Figure We make three 
observations. First, the newly possible interactions (i.e. 
those with probability zero in the absence of Crossrail) 
tend to have higher average edge lengths than existing in¬ 
teractions. Second, the existing connections that are up¬ 
graded or downgraded in probability seem to have identi¬ 
cal spatial length distribution. Third, the increases tend 
to occur along the new railway route while the decreasing 
edges tend to have one or both nodes in the orthogonal 
dimension (i.e. north-south corridor). The conclusion 
here is highly intuitive: apart from new connection pos¬ 
sibilities between regions that are otherwise separated 
by large spatial, the changes in interaction probabilities 
depends less of distance between nodes than the nodes’ 
locations relative to the new infrastructure. 

S3 Details of empirical validation 
examples 

102 US Metropolitan Statistical Areas (MSAs) were cho¬ 
sen on the basis of the availability of HIV infection rate, 
GDP, and spatial population distribution data. 


v' = (4.0,15.0,15.0,33.0,33.0,33.0,33.0,33.0). (S34) 


S3.1 Data sources 


As before, the values have units of kilometres per hour 
and here the three values represent the average speeds 
of walking, bus, and metro travel. Three example net¬ 
works for a given population distributions of attribute 
values are shown in Figure |S6| First we observe, some¬ 
what trivially, that for a single city an increase in 
can lead to an increase in number of edges. Second, for 
similar spatial distributions, we see that the network de¬ 
gree distributions of two cities can be markedly different, 
even in the case where the number of edges are similar. 
Here we compare the expected degree distribution, tak¬ 
ing the average over 120 random attribute-value ranking 
distributions. As shown in Figure |S6[ the city with a 
dense centre has a signihcantly higher level of network 
heterogeneity than the uniformly distributed city. 

S2.2 Spatial extent of spatial networks 

Since the underlying interaction networks behind the 
connectivity measure are spatial networks, it can be use¬ 
ful to examine the impact of urban infrastructure changes 
not just on overall and local connectivities, but on the 
spatial nature of those changes. In this section we use the 
example of London Grossrail from the main text. There 
we calculate both the impact on total connectivity and 
its local spatial variations. Here, we go one step further 
by predicting the expected distribution of the interaction 
network edge lengths in the city of London (in terms of 
Euclidean spatial distances) before and after the con¬ 
struction of London Grossrail. 


The population statistics and density profiles of the 102 
US MSAs are obtained from the U.S. Gensus Burear0 
HIV infection and prevalence data are obtained from the 
United States Genters for Disease Gontrol and Preven- 
tioi£] Travel times between city locations are obtained 
using Microsoft BING map^and for car journeys origi¬ 
nating at 1200h local-time on 13th Dec 2013. 

Given the marginal radial distribution of the popula¬ 
tion, we assume a circular symmetry about the central 
city hall location and sample a set of 1000 points for 
each US MSA. One can drop this assumption and obtain 
more accurate and precise population distribution data, 
for instance from detailed local census and other open 
data sources. The total number of ties Ti in each MSA i 
is then calculated by applying (S26) to the travelling¬ 
time distance matrices obtained from online mapping 
resources. The mode of transport here is restricted to 
travel by roads - an assumption that is reasonable for 
many US MSAs. 

In the example of HIV infection rates, the relevant 
number of ties are encounters, T', between HIV-positive 
and HIV-negative individuals, rather than the total num¬ 
ber of ties. We therefore scale the total no. of ties by the 


^2010 Census of Population and Housing, 2010 U.S. Metropoli- 
tan Statistical Area Distance Profiles, www.census.gov 

^US Centers for Disease Control and Preven¬ 
tion. HIV Surveillence Report, 2011; vol.23. 

WWW. cdc.gov/hiv / topics / surveillance / resources / reports /. Feb 

2013. 

8www.microsoft.com/maps/ 
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mixed-ties proportions, giving 

2H{N-H) 
N{N-1) '■ 


(S35) 


where N is the population size of a given MSA and H the 
number of individuals diagnosed as HIV-positive, which 
we take to be equal to the reported number of HIV¬ 
positive individuals in the population. 


S3.2 Robustness of Tmax estimation 

In this section we gauge the robustness of the maximum 
likelihood travelling-time budget estimates obtained 
for the HIV-infection rates and GDP-related attributes 
by constructing confidence intervals (C.I.) around the re¬ 
spective point estimates. We present two versions: a 
bootstrap C.I. and a C.I. based on the asymptotic vari¬ 
ance of the maximum likelihood estimator in terms of 
the observed Fisher information. 

Using Nb = 1000 bootstrap replicates of the origi¬ 
nal set of 102 US cities, we repeat the linear fits of 
the urban indicators to tie-density. We obtain a set of 
bootstrap travelling-time maximum likelihood estimates 
{"^max ililfi which then provides a bootstrap confidence 
interval Cboot in terms of the empirical quantiles |42j . 

Next, we assume that the residues of the log U — log T 
linear fit log U = normally distributed 

with s be the sample standard deviation of the maximum 
likelihood fit q.mu. We further assume that the data 
points are independent, whereby the likelihood is 

n 

U(rniax) = ]^/T„ax(lo§ Ui). (S36) 

i=l 


/Tmax(ioS ^i) til® univariate normal density function with 
mean 5r„ax(iog ^'Ud variance The observed 

Fisher information / is then 

j _ d L{Tn\?ix) 

d^Tjiiax 


"^max—‘7 ’tx- 


(S37) 


where L = log C is the log-likelihood. Practically, we 
obtain (S37) through a series of Gaussian Process fits 
though the set of emprical data points {Tmax, T(Tmax)}- 
In the asymptotic limit, the maximum likelihood esti¬ 
mator is normally distributed with variance —1/7 which 
is used to define the C.I. Cmig. Strictly speaking, the 
asymptotic distribution is clearly not normal as the pa¬ 
rameter Tmax > 0. However, at least for the GDP at¬ 
tribute, the maximum likelihood estimate is sufficiently 
away from the zero boundary for this to be a reasonable 
assumption. 

The C.I.s are illustrated in Figure for both at¬ 
tributes. From the analysis we have the 95% C.I. Cboot = 
[0.36,1.52] and Cboot = [0.36,5.42] (C^ie = [0.15,4.65]) 
for the HIV infection rates and GDP-related attributes 
respectively. While the Tmax estimate for the HIV infec¬ 
tion rates attribute is fairly robust, the C.I. for the GDP 


estimate spans > 4 hours. This behaviour confirms the 
intuition that the GDP indicator pertains, in reality, to 
an amalgamation of many attributes with varying sizes 
of Tmax- For instance, a typical city inhabitant is unlikely 
to patronise a laundromat more than a few minutes from 
home; on the other hand, the same person is probably 
willing to endure a long commute across the city for a 
one-off visit to a unique theme park, say. Both activities 
contribute to GDP, and this difference is reflected in the 
wide span for the Tmax estimates. 


S4 Details of HS2 and London 
Crossrail analysis 

S4.1 Data sources 

London and Birmingham demographic profiles and geo¬ 
graphical details are obtained from the Greater London 
Authoritjj^ and the Birmingham City Counci|^ respec¬ 
tively. Details of HS2, including routes, station locations 
and travel speeds are obtained from the High Speed Two 
LimitecPl London Crossrail station and travel times are 
obtained from Transport for Londoif^ Current trav¬ 
elling times between city locations are obtained using 
Microsoft BING map:|3 

We obtain geographic samples from the cities of 
London and Birmingham, UK from two-dimensional 
(weighted) kernel density estimates (KDE) of the popu¬ 
lation spatial distributions. The central locations of the 
32 boroughs in London and 40 wards in Birmingham are 
treated as data points with weights proportional to the 
local population sizes. We use a Gaussian kernel with 
bandwidth equal to 1.2 times the radius of a circle with 
area equal to the local borough or ward for each data 
point. The population of each city is then sampled from 
this weighted mixture of Gaussians. We have a total of 
1000 and 128 location samples for London and Birming¬ 
ham respectively. 

The travelling time distance matrices used represent, 
for the majority of point-pairs, public transport trav¬ 
elling time. In the absence of public transport data be¬ 
tween two locations, we assume that the relevant journey 
is taken by car. As for the US MSA examples, the data 
is obtained from online mapping resources. We selected 
a departure time of 1200 on 12th December 2013. 

For the HS2 example, we assume a single interchange 
station in each city (London Euston, and Curzon Street 
station in Birmingham). The travelling time between 
locations in each city is a sum of the travelling times 
to each station and the published journey time between 
the two cities (we do not factor in waiting times, delays, 

®http: //data.london.gov.uk/datastore 
WWW. birmingham.gov.uk 
llwww.hs2.org.uk 
1 ^ WWW. crossrail. co.uk 
1® WWW. microsoft.com/maps/ 
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etc.). 

There are 36 stations on the London Crossrail net¬ 
work, with an additional 12 in the CrossRail 2 (metro- 
only option) extension. The improved travelling time be¬ 
tween two London locations is the sum of the travelling 
times from the origin and destinations to their respec¬ 
tive closest (by time) Crossrail stations and the published 
station-to-station journey time. 
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Figure S5: Density maps of travelling-time social spheres in London (with Crossrail) as a function of r^ax 
and location. The coloured square, circle and triangle represent example central, western, and southern locations in 
London respectively. The contour maps represent kernel density estimates of samples {Ng = 1000) within the indicated 
travelling-time distance budget. The western location in b lies directly on a Crossrail station (West Drayton station), 
while the southern location (South Croydon station) in c is chosen to illustrate a relatively inaccessible location in the 
city. See SI Section S4 for details of the construction of the distance matrices used. 


Uniform T^^^=0.35h Uniform 


r„,^„=0.35h Unif +Gaussian T^^^=0.35h d 



45 km 


Network degree 


Figure S6: Emergence of network structure. a,b,c, Simulated city interaction network examples. The red nodes 
represent individuals while the network edges indicate a directed social-tie (direction not specified). The nodes in 
networks a and b are uniformly distributed while those in network c is sampled from a (1/3, 2/3)-weighted mixture of 
a uniform and Gaussian distribution with component standard deviation of 4 km. d, The network degree distributions, 
averaged over 120 different (and random) attribute-value distributions, for the three simulated cities. 
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Figure S7: Crossrail effect on London interaction network, a, Newly possible interaction edges, b, existing 
possible interactions that have increased in probability, c, existing possible interactions that have decreased in prob¬ 
ability. The three networks are taken from a subnetwork with 70 nodes, d. Expected distribution of the interaction 
network edge lengths for the three classes of interactions. The edge lengths are given in terms of the spatial Euclidean 
distances between nodes. 



Figure S8: Robustness of r^ax estimates. Plots of rescaled-values of log U — log T linear fits as a function of 
Tmax- The black solid line is the curve using the original dataset of 102 cities. Each of the 1000 grey curves represents 
the values of a separate bootstrap sample of the original data, rescaled such that Tmax matches the value from the 
original dataset. The red crosses indicate the maximum values of the bootstrap curves and the blue circle the same 
for the original curve. The green dashed and purple dot-dashed lines indicate the 95% bootstrap confidence interval 
and the observed Fisher information-derived confidence interval respectively. 
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